Tuning Java Garbage Collection for Spark Applications
نویسنده
چکیده
Spark is gaining wide industry adoption due to its superior performance, simple interfaces, and a rich library for analysis and calculation. Like many projects in the big data ecosystem, Spark runs on the Java Virtual Machine (JVM). Because Spark can store large amounts of data in memory, it has a major reliance on Java’s memory management and garbage collection (GC). New initiatives like Project Tungsten will simplify and optimize memory management in future Spark versions. But today, users who understand Java’s GC options and parameters can tune them to eek out the best the performance of their Spark applications. This article describes how to configure the JVM’s garbage collector for Spark, and gives actual use cases that explain how to tune GC in order to improve Spark’s performance. We look at key considerations when tuning GC, such as collection throughput and latency.
منابع مشابه
Tuning J2EE Application Servers
Ever since the introduction of the J2EE Enterprise Application specification from SUN Microsystems, this new technology has experienced tremendous growth. This paper will discuss tuning approaches for J2EE application servers. After covering tuning options at the architectural level of a J2EE application, we discuss the impact of garbage collection and the JVM on application server performance,...
متن کاملTuning Java’s Memory Manager for High Performance Server Applications
Java is a strong player in the application server market and thus the performance of its virtual machine is an important aspect of a server’s performance. One of the components that affect the performance of a JVM is the memory manager, which also includes the garbage collector. Modern virtual machines offer a multitude of options for tuning the memory manager, which can have a significant impa...
متن کاملTrash Day: Coordinating Garbage Collection in Distributed Systems
Cloud systems such as Hadoop, Spark and Zookeeper are frequently written in Java or other garbage-collected languages. However, GC-induced pauses can have a significant impact on these workloads. Specifically, GC pauses can reduce throughput for batch workloads, and cause high tail-latencies for interactive applications. In this paper, we show that distributed applications suffer from each node...
متن کاملJava Garbage Collection Characteristics and Tuning Guidelines for Apache Hadoop TeraSort Workload
متن کامل
Adaptive Garbage Collection for Battery-Operated Environments
Energy is an important constraint for battery-operated embedded Java environments. In this work, we show how the garbage collector (GC) can be tuned to reduce the energy consumption of Java applications. In particular, we show the importance of tuning the frequency of invoking GC based on object allocation and garbage creation rates to optimize leakage energy consumption. We reduce the leakage ...
متن کامل